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Abstract 

We present an ordinary differential equations approach to the analysis of algorithms for con- 
structing h minimizing solutions to underdetermined linear systems of full rank. It involves a 
relaxed minimization problem whose minimum is independent of the relaxation parameter. An 
■^ . advantage of using the ordinary differential equations is that energy methods can be used to prove 

7 ! ' convergence. The connection to the discrete algorithms is provided by the Crandall-Liggett the- 

ory of monotone nonlinear semigroups. We illustrate the effectiveness of the discrete optimization 
algorithm in some sparse array imaging problems. 
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Introduction 



^ I We consider the solution of large underdetermined linear systems of equations Ax = y where A £ 

^\ . j^mxn jg ^ given matrix, y G M*" is a known vector of m <C n measurements, and x £ M" is the 

unknown signal or image to be estimated. We assume that A has full rank equal to m. We want to 



^SJ ! find the solutions of this system with minimal /i norm ||3;||ii, 

min||x||;j, subject to y = Ax . (1.1) 



Our motivation is array imaging problems, which is an application discussed in this paper, but such 
sparsity inducing constrained minimization problems, where the li norm of the solution vector is 
^ used, arise in many other applications in signal and image processing [14j . 

r> ■ A lot of research has been devoted to developing algorithms for solving efficiently (jl.ip and its 

C^ . relaxed form 

min< r||x||;^ -I- -||y - Ax||^ > . (1.2) 



Here, and throughout the paper, \\q\\ denotes the Z2-norm of a vector q. In ()1.2p . the exact constraint 
y = Ax is relaxed so as to take into account possible measurement noise, and r is a positive real 
parameter that promotes sparse solutions when it is large enough. 

The iterative shrinkage-thresholding algorithm (ISTA) is the usual gradient descent method 
applied to (11. 2p . It has been used in many different applications with great success, such as [T2| 
[T6Hl8t [25 t H7] . just to mention a few. The ISTA algorithm generates a sequence of iterates {x^} of 
the form 

Xfc+i = r]rh{xk - hVf{xk)) ■ (1.3) 
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Here, h is the step size, 



■na{x) 



X — a, 


if X > a, 


0, if 


— a < X < a, 


X + a, 


ii X < —a 



[lA] 



is the shrinkage-thresholding operator, and V/(xfc) denotes the gradient of /(x) = ^\\y — Ax]]"^ at 
the current iterate x^- Thus, V/(xfc) = A*{Axii. — y), where A* denotes the complex conjugate 
transpose of A. The algorithm ()1.3p involves only simple matrix- vector multiplications followed by 
a shrinkage-thresholding step. 

For a fixed value of r the solution to (|1.2p differs in general from the solution of (jl.ip . In other 
words, exact recovery from noiseless data is not achieved by solving (ll.2p . unless the regularization 
parameter r is sent to zero. However, it is well known that the convergence of (jl.3p is slow for 
small values of the parameter r. This issue is considered in detail in [5]. Variants of (II. 3p have been 
proposed to speed up its convergence rate. In [2j, for example, a fast version of ISTA is proposed 
(FISTA, described in more detail below in Section [3]) that has as easy an implementation as (II. 3p 
but has a much better convergence rate. 

In this paper, we present an ordinary differential equations (ODE) approach to an iterative 
shrinkage-thresholding algorithm for solving ^i-minimization problems independently of the regular- 
ization parameter r. We use a generalized Lagrange multiplier, or augmented Lagrangian, approach 
[3l[271[29l[38lll0] to the relaxed problem (II. 2p to impose exact recovery of the solution to (II. ip . The 
exact solution is sought through an efficient algorithm obtained from a min-max variational princi- 
ple, which is a special case of the Arrow-Hurwitz-Uzawa algorithm [I]. We prove that this algorithm 
yields the exact solution for all values of the parameter r. Our only assumption is that the matrix A 
has full rank. The connection of the ODE method to the iterative shrinkage algorithm is provided 
by the Crandall-Liggett theory [15], which analyzes the convergence of an implicit finite difference 
discretization of the ODE. The theory works for infinite dimensional, monotone nonlinear problems 
as well. The performance of the algorithm, with and without noise in the data, is explored through 
several numerical simulations of array imaging problems. 

The min-max variational principle used here is also behind the Bregman and linearized Bregman 
iterative algorithms [28 1 I37 1 H7 1 I48). The fully implicit version of the algorithm is also analyzed in 
detail in |13p23j using different techniques. Many other methods have been proposed in the literature 
to solve (jl.ip and (|1.2p with large data. We just mention here some of them: homotopy [22 t l36 l H3]. 
interior-point methods [46], gradient projection [26], and proximal gradient in combination with 
iterative shrinkage-thresholding [2ll341[35] . A detailed discussion and analysis of monotone operator 
splitting methods can be found in [39] . 

Finding the constrained, minimal /i norm solution in (jl.ip does not imply that this solution 
vector has minimal support, even though the /i norm is sparsity promoting. Nevertheless in many 
applications, in imaging in particular, this optimization method does produce the minimal support, 
or minimal Iq norm solution. The theory of compressed sensing [6ti9]l20 1 [2 H l45j gives conditions under 
which constrained h and Iq minimizations are equivalent. We do not address this issue here. 

The paper is organized as follows. In Section [2] we motivate our approach, summarize our main 
results, and describe the numerical algorithm. Theorems 12.61 and 12.41 are the main results of this 
paper. A key ingredient in the proof of these theorems is Theorem 12.71 proved in Section HI The 
proof of the variational principle of Theorem 12.21 is presented in Section [6l This result is originally 
due to [lO] but we present it here for the convenience of the reader. In Section [3] we show the 
performance of the algorithm with and without noise in the data using some numerical experiments 
of array imaging. Finally, Section [7] contains conclusions. 
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2 Formulation and main results 

We consider the constrained optimization problem (jl.ip under the assumptions that (jl.ip has a 
unique minimizer x, and that A has full rank: the matrix AA* is invertible. 

2.1 The min-max variational principle 

In order to find the minimizer x, we recall the variational formulation of the /i-minimization problem 
[HlEHlEHllin]. Define the finction 

F{x,z) = t\\x\\i^ +^\\A^x -yf + {z,y- Ax) , 

for X G M"- and z £W^, and set 

F = ioaaxin.m{F{x,z)} . (2-1) 



Proposition 2.1 The problem /i2.1\} has a solution, that is — oo < F < +oo, and the max-min is 
attained. 

Proof. The function F{x,z) is convex in x, and liuix^oo F {x , z) = +oo, for any fixed z. Thus, 
F{x, z) attains its minimum for a fixed z. Let us denote 

l{x)=T\\x\\i, + ^\\Ax-yf, (2.2) 

and 

h{z) = minF(x, z) = min[Z(x) + {z,y — Ax)]. (2-3) 

X X 

As the function l{x) is convex, and l{x) — )■ +oo, as |x| — )• oo, it follows that h is concave, as a minimum 
of affine functions, and h{z) — t- — oo, as \z\ — t- oo. Thus, it attains its maximum max^ h{z). D 

In order to motivate the functional ()2.ip we look at another natural way to impose the constraint 
in ()l.ip by using a Lagrange multiplier. If we consider a functional 

T\\x\\i^ + {z,y-Ax), (2.4) 

then (at least, formally) its Euler-Lagrange equations for the extremum give us the suh- differential 
optimality condition 

[A*z]^ = \ ^'if^^^_>0' and|[AHI<r. (2.5) 

[ -r, if Xi < 0, 

It is, however, difficult to work with (12. 4p . because if some of the entries of A*z are larger than r in 
absolute value, then ()2.4p is not bounded from below as a function of x. Further, even if z is chosen 
according to the sub-differential condition (12. Sp . then the minimum may not be unique, even if A 



is invertible. Indeed, consider a simple example: minimize |x| subject to x = 1. Suppose r = 1, 
then (j2.4p is |x| + z{l — x). Then z = 1 satisfies the sub-differential condition, and (|2.4p becomes 

I i^n ^ / 1> if^>0, 
\x\ + (1 -x) = < 

I 1 - 2x, if X < 0, 



which has no minimum. The addition of a quadratic term to (j2.4p regularizes this degeneracy. Since 
the function l{x) in (12. 2p is convex, (12. 3j) may be interpreted (up to a sign) as a generalized Legendre 
transform of l{x). 

The first observation is that if (jl.ip has a unique minimum x then the variational principle (j2.ip 
finds x exactly. 

Theorem 2.2 Assume that U.l\} has a unique minimum x. Then we have 

r||x||;j = maxminF(x, z) (2-6) 

2 X 

Moreover, we have t||x||;^ = F{x,z) for any z, and if minx F (x , z) = ''"H^llh for some fixed z, then 
argmin^ F{x, z) = x. 

This result can be found in [40] in a much greater generality. We present its proof below in the 
particular case we are interested in, for convenience of the reader. 

It is remarkable that (12. 6p holds for any value of r > - this gives us a freedom to choose r 
large or small, depending on a particular application. We also have the following well known result 
|40j . which follows from the proof of Theorem 12.61 below. 



Theorem 2.3 Assume that ^.1\) has a unique minimizerx. Then, there exists a vector z such that 
[A*z\i = sgn{x) if x y^O, and \[A*z]i\ < 1 if Xi = 0. 

We say that z satisfies the sub-differential condition if there exists a minimizer of p.ip such that 

[A*z]i = rsgn(x) ff x / 0, and \[A*z]i\ < r ff x, = 0. (2.7) 

We note that (12. 7p is weaker than the sub-differential condition of [^ - there it is required that 
|[j4*2;]j| < r if Xj = 0, while we do not require the strict inequality. It follows from the proof 
of Theorem 12.31 that the exact extremum of F{x, z) is achieved for any z that satisfies the sub- 
differential condition (j2.7p . Such z is not unique but, of course, our interest is not in finding z but 
in finding the minimizer of p.ip . 

2.2 The ordinary differential equations method 

In order to find x, ideally, we would like to take the ODE point of view and generate a trajectory 
{x{t),z{t)) of the following system 

^ = -V,F(x,z), ^ = V,F(x,z), (2.8) 

with the hope that x(i) — )• x as t — )• +cxd. There is an obvious degeneracy in the problem, namely, 
F{x, z) = r||x||/j for all z G W^. Hence, we can only hope to recover x as there is no "optimal" z. 

The obvious technical difficulty is that the function F{x, z) is not differentiable in x at the points 
where Xj = for some j = 1, . . . , n. Following |15j . we interpret solutions of (|2.8p as follows. Given 
X G M", let the sub-differential 5||x||;j be a subset of W^: 

d\\x\\i^ = sgn(xi) X • • • X sgn(x„). 



Here sgn(s), for s G M, is understood a subset of M: sgn(s) = {1} if s > 0, sgn(s) = {—1} if s < and 
sgn(s) = [—1, 1] if s = 0. Then, instead of treating the system of ODEs (|2.8p with a discontinuous 
right side, we consider 

flT 

__A*(z-Ax + y)e-ra||x|b„ (2.9) 

dz 

—r = y — Ax, 

dt ^ 

supplemented by the initial data a;(0) = xq, z{0) = 0. We say that (x(t),z(t)) is a strong solution 
to (|2.9|) on a time interval < t < T if a;(t) and z{t) are continuous, differentiable for almost all 
t G [0,r], x{0) = xo, z{0) = 0, and (USD holds for almost all t £ [0,T]. 

An important observation is that (j2.9p is contractive, or, accretive in the sense of Crandall 
and Liggett |15j. That is, the following property holds: given any pair {xi,zi), {x2,Z2) and any 
^1 G 5||xi||i,, 6 G 9||x2||/i, we have: 

{A*{zi - Axi) - r^i - A*{z2 - Ax2) + t^2) ■ {xi - X2) - {Axi - AX2) ■ {zi - Z2) 

= -T{il - 6) • (^1 - X2) - \\A{XI - X2)f < 0. (2.10) 

The last inequality above follows from the component-wise monotonicity of the sub-differential 
9||x||ij. It follows from (I2.10p and Theorems I and II of [l5] that (12. 9p has a unique strong so- 
lution. Our first result shows that this solution converges as t — >• +00 to x, the minimizer of (jl.ip . 

Theorem 2.4 Let M-l\) have a unique minimizer x. Then, for any 5 > there exists T = T{5) 
such that the solution of (12. 9p satisfies 

\\x{t)-x\\<6, for allt>T. (2.11) 

The time T{5) depends only on 5, the initial data xq, and ||j4j4*|| hut not on the dimension n. 

2.3 The discrete algorithm 

We consider the following numerical algorithm to solve (j2.9p : 

Xk+l - Xk 



At 

Zk+l — Zk 

At 



-TCk+i + A*{zk+i + y-Axk+i), (2.12) 

y - Axk+i, 



with the initial data xq = x, zq = 0. Here, £,k+i is a vector in the set 9||xfc_|_i||;^. 
A simple way to understand how (j2.12p works is to consider the toy problem 

r = — sgnr. (2.13) 

An explicit discretization 

rk+i - rk _ _ 

At ~ ^'^' 

with ^k £ sgn(rfc), will start oscillating around r = as soon as r^ G [—At, At], and will never 

converge to x = for At > 0. On the other hand, the implicit discretization 

"-^^^ = -a+i, (2.14) 

5 



with (,k+i G sgn(rfc+i) behaves differently. If r^ E [— Ai, At], the implicit nature of this scheme shows 
that it is impossible to have £,k+i = il, which forces ^k+i = fkl ^t and r^+i = 0. The implicit 
scheme is actually equivalent to soft thresholding: 

rk+i = r]Atirk)- (2.15) 

The function rjg here is defined by (jl.4p . This simple example already shows both the importance of 
using an implicit discretization, and that the implicit scheme has a simple explicit realization (j2.15p . 
Theorems I and II of [H] not only provide existence of a strong solution to (|2.9p but also show 
that it can be found by the implicit scheme (j2.12p . 

Proposition 2.5 Solution of \2.1S^) converges as At — )■ 0, uniformly on finite time intervals, to the 
unique strong solution of Ii2.!. 



Theorem 12.41 and Proposition 12.51 together imply immediately the following theorem. 

Theorem 2.6 Let the sequence x„,, Zn solve i2.12\) with the initial data xq = x, zq = 0. Given 
any (5 > there exists h > and T > 0, so that for all < At < h and all k > [T/At] we have 
\xk — x\ < 6. The time T depends on 5, the initial data x £ M", and the norm ||^^*||. 

If one examines the proof of Theorems I and II in [15j, it is clear that the only term that 
should be discretized implicitly is sgnx - the other terms can be discretized explicitly, keeping the 
statement of Proposition 12.51 intact. Hence, the result of Theorem 12.61 applies equally well to an 
Euler quazi-explicit modification of ()2.12p that is easier to implement numerically: 

xfc+i = xk- Ck+i + AtA*{zk + y - Axk) , 

Zk+i= Zk + At{y - Axk), (2.16) 

where ^k+i G ''"AtO||xfc+i||i^ is a vector in the subdifferential of rAt ||xfc+i||zi. We call this scheme 
the generalized Lagrangian multiplier algorithm (GeLMA). As in the toy problem ()2.13p - ()2.15p . it 
is equivalent to soft thresholding: 

xk+i = VrAt {xk + AtA*{zk + y- Axk)) , 

Zk+i = Zk + At{y- Axk). (2.17) 

This scheme converges if At < 1/| |^| | - that condition simply comes from the usual constraint for an 
explicit scheme for a linear system. GeLMA algorithm is extremely easy to implement numerically. 
We also note that one can mimic the ODE proof of Theorem 12.41 directly on the numerical 
scheme, eliminating, in particular, the dependence of h on 6. Our objective, however, in part, 
is to explain the effectiveness of shrinking-thresholding algorithms in the language of differential 
equations, potentially opening the way for the application of other continuous techniques in such 
problems. Therefore, we have chosen to concentrate on the ODE proof. 

2.3.1 The regularized ordinary differential equations 

Since the system (12. 9p has a "bad" right side, working with it directly is technically inconvenient. 
Hence, in order to prove Theorem 12.41 from which Theorem 12.61 follows, we consider a regularized 
system, introducing a single-valued approximation of sgnx: 



Geis) 



1, if s > e, 

s/e, if \s\ < e, 
— 1, if s < — e. 



Here e > is a small regularization parameter that will be sent to zero at the end. With a slight 
abuse of notation, here, and in other instances when this should cause no confusion, we will also 
denote by Ge(x) a vector valued function with components Ge(x) = (^^(xi), Ge(2;2), . . . ,Ge(x„)). 
The regularized version of ()2.9p is 

Hif n z 

-^ = -rGe {xe) + A*{z + y- Ax^) , -^ = V - ^x,. (2.18) 

It has the same form (|2.8p . with F{x,z) replaced by a differentiable approximation 

n 

F,{x,z) = r^r,(x,-) + f{x) + {z,y - Ax). (2.19) 

i=i 

Here, 

|s|, if s > e, 
re{s) = { sV(2e)+e/2, if |s| < e, 
|s|, if s < — e, 

is an approximation of |s| known as the Huber function. We will denote below 

n 

\\x\\ii=^r,{xj), (2.20) 

i=i 

though, of course, this is not a norm as it does not vanish at x = 0. 

Theorem 2.7 Let U.l\) have a unique minimizer x. Then, for any 5 > there exists Eq = eQ{5,n) 
and T = T{6) such that for any e, < e < Eq the solution of (j2.18p satisfies 

\\x^{t) -x\\ <6, for allt>T. (2.21) 

The time T{5) depends only on 6, the initial data xq, and ||ylyl*|| but not on the dimension n. 

When the minimizer of (jl.ip is not unique, the proof of Theorem 12.71 can be easily adapted to show 
that for any 5 > Q there exists eo('^) such that for any e £ (0,eo) and any limit point (as t — )• +oo) 
Xe of the trajectory Xs{t), we have ||xe — x|| < (5 for some minimizer x of (jl.ip . 

Theorem 12.71 is the key ingredient in the proof of Theorem 12. 6t together with a priori bounds on 
Xe(t) obtained in the course of its proof, they show that solution x(t) of (j2.9p is the limit of Xe(t) as 
e — )■ 0, and thus it obeys the same bounds as Xe(i), finishing the proof. 

3 Application to array imaging 

In this section we illustrate the performance of our algorithm for array imaging of localized scatterers. 
The problem is to determine the location and reflectivities of small scatterers by sending a narrow 
band (single frequency) probing signal of wavelength A from an active array and recording the 
backscattered field on this array [4J. In this paper we consider only single illumination by the central 
element of the array. 



3.1 Array imaging in homogeneous media 

The array has N transducers located at positions Xp [p = 1,...,A^) separated from each other 
by a given distance. In each numerical experiment there are M point-like scatterers of unknown 
reflectivities pj > located at unknown positions y^ (j = 1 , . . . , M) . The scatterers are assumed 
to be within a bounded region at a distance L from the array, called the Image Window (IW). 
We discretize this IW with a uniform mesh of K points y,- (j = 1, . . . ,K), and assume that each 
scatterer is located at one of these K grid points, so {y^^ , • • • , ynm} "^ {Vi^ • • • > Vk}- 

Furthermore, we assume that the medium between the array and the scatterers is homogeneous 
so wave propagation between any two points x and y is modeled by the free space Green function 

^ , N exp(— mla; — ■yl) /„ -x 

^.■n\x — y\ 

where k = w/c = 27r/A, and c is the reference wave speed in the medium. We also assume that the 
scatterers are well separated or are weak, so multiple scattering among them is negligible (this is 
the Born approximation). Under these conditions, the backscattered field measured at x^ due to a 
pulse sent from aj^, and reflected by the M scatterers in the IW, is given by 

M 

briu) = ^pjGo{xr,y^^,u))Go{y„^,Xs,u)) . (3.2) 

j=i 

Next, we write the linear system that relates the reflectivity poj at each grid point y • of the IW 
(j = 1, . . . , K) and the data bj-ioj) measured at the array (r = 1, . . . , N). To this end, we introduce 
the reflectivity vector Pq = (poi, /002, • • • , Pok)^ £ K^ and the data vector h{oj) = (6i, 62, • • • , ^A^)"^ £ 
R-^, where the superscript T means transpose. Thus, the image is a gridded array of K pixels, 
and the data is stacked into a vector of A^ <C i^ components. Furthermore, there are only a few 
scatterers in the IW so the vector pg is sparse. 

Let us consider the vector 

aoiVj,^) = {Go{xi,yj,uj),Go{x2,yj,uj),...,Go{xN,yj,uj))^ , 

that represents the signal at the array due to a point source at t/,- in the IW. Due to the spatial 

reciprocity Go{xi,y,oj) = Go{y,Xi,u)), it can also be interpreted as the illumination vector of the 
array at position t/-. With this notation, we can write the linear system 

A^Pq = b{u) , (3.3) 

where ^^ is an A'^ x ii' matrix whose j*^ column is given by Go{yj,Xs,co)gQ{yj,co). Since N <^ K, 
()3.3p is an underdetermined linear system, and hence there can be many configurations of scatterers 
that match the data vector b{uj). Array imaging is to solve (j3.3p for pQ. 

A related problem to (13. Sh has been studied in [10] in array imaging of localized scatterers from 
intensity-only measurements. Intensity measurements are interpreted as linear measurements of a 
rank one matrix associated with the unknown reflectivities. Since the rank minimization problem 
is NP-hard, it is replaced by the minimization of the nuclear norm of the decision matrix. This 
makes the problem convex and solvable in polynomial time. It is shown that exact recovery can be 
achieved by minimizing this problem. 



3.2 Numerical Simulations 



We consider here numerical experiments in 2D. Our linear array consists of 100 transducers that are 
one wavelength apart. Hence, the aperture of the array is o = 100. In each numerical experiment 
there are a few point-like scatterers of different reflectivity at a distance 120 from the array. The 
IW is discretized with 41 x 41 grid points. Hence, we have 1681 unknowns and 100 measurements. 
All the spatial units are expressed in units of the wavelength A of the illuminating source. 

Fig. [1] shows results from various scatterer's configurations with no noise in the data. In the 
top row we display the original scatterer's configurations and in the bottom row the corresponding 
images obtained by the ii minimization GeLMA algorithm (I2.17p . These results show that this 
algorithm recovers the positions and reflectivities of the scatterers exactly when there is no noise 
in the data. To examine this issue more clearly we plot in Fig. [2] the vector solutions p (green 
crosses) and the exact vectors pg (blue circles) for these three scatterer's configurations. There 
is not apparent difference between the exact and recovered solutions. Both, localization (support 
recovery) and strength estimation (reflectivities) are solved exactly in all the cases. 
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Figure 1: Top row: original configurations of the scatterers within the 41 x 41 IW. Bottom row: 
recovered images obtained by the li minimization GeLMA algorithm (j2.17p with no noise in the 
data.r = 20\\Alh{uj)\\i^ and Nu^r = 300. 



Figure 2: Comparison between the exact solutions (blue circles) and the solutions obtained with the 
GeLMA algorithm (green crosses) with no noise in the data. 



An interesting feature of the GeLMA algorithm ()2.17l) is that it attains the exact solution of 
the basis pursuit problem for large values of the regularization parameter r. This speeds up the 
convergence rate. Informally, this speed-up of convergence can be seen from the coercivity estimate 
(j2.10p and the error estimate (j4.7p . Note that for other popular gradient based algorithms, such 
as ISTA or FISTA [.2j, r has to be smaller than ||A^b(Li;)||;^. Otherwise, they converge to the 
(maximally sparse) zero solution p = 0. To examine this property in more detail, we show in Fig. [3] 
(left panel) plots of the £2 distance to the exact solution ||p — Pq|| as function of the iteration number 
for various values of r = a||^J^b(a;)||i^: a = 2 (solid line), a = 5 (dashed line), a = 10 (dot-dashed 
line), and a = 20 (dotted line). We observe that the larger the value of r is, the faster is the 
convergence rate. Furthermore, for all the values of r the algorithm achieves the exact solution pg. 

In Fig. [3] (right panel) we compare the convergence rates of the GeLMA algorithm and the 
FISTA algorithm 



,(fc) 



riraM'^-akVfiC^'^)), 



"fe+l 



l + Jl + 4al 



^{k+i)^pik)^^k^^p(k)_pik-i)^^ 



Ofc+l 



(3.4) 

(3.5) 
(3.6) 



for r = 0.01||^|^b(a;)||;^. We choose a small value of r because we are considering noisefree data 
in these examples. In ()3.4p - (j3.6p . p^ and ^2 = Pi are given, and ai < 2/L. We observe that the 
convergence rate of the FISTA algorithm (solid line) for r = 0.01||A^b(a;)||;_^ is much slower than the 
convergence rate of the GeLMA algorithm for r = 20||A^b(a;)||i^. Even more, the FISTA algorithm 
with r = 0.01||^|^b(a;)||;^ does not obtain the exact solution. To achieve the exact solution, we 
would have to let r — )• 0. 





iteration 



Figure 3: Right: Plots of the convergence rate of the GeLMA algorithm for various values of 
T = a ||^(^b(a;)||;^: q = 2 (solid line), a = 5 (dashed line), a = 10 (dot-dashed line), and a = 20 
(dotted line). Left: Gomparison of the converge rates of the GeLMA algorithm with a = 20 (dotted 
line) and the FISTA method with a = 0.01 (solid line). In these numerical experiments we have 
used the four scatterers configuration shown in the top right image of Fig. [TJ Noiseless data. 



Next, we examine the performance of the GeLMA algorithm under noise contaminated data 
b{u}) + e{uj). The noise vector e(a;) is generated by independent Gaussian random variables with 
zero mean and standard deviation /3 ||&(ti;)||/viV. Here, /3 is a parameter that measures the noise 
strength. In Fig. HI we show the results for /3 = 0.05 (left column), /3 = 0.1 (middle column), and 
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P = 0.3 (right column). For a fixed step size At, the regularization parameter t = a ||^tLb(w)||i^ 
controls the sparsity of the solution. Hence, one expects the algorithm to be more stable with respect 
to additive noise when r is large. We plot in Fig. IHthe recovered images using different values of 
r: a = 2 (top row), a = 20 (middle row) and a = 100 (bottom row). We observe in the top row 
that the location of the scatterers is recovered exactly when there is 5% noise in the data (left plot). 
The recovered reflectivities are also quite close to the real ones. However, when the noise increases 
to 10% (middle plot) one scatterer is missing in the recovered image that also shows some ghost 
scatterers. As expected, the image gets worse when the noise is 30%, as can be seen in the right 
plot. The results are much better when we increase the value of a to 20 (middle row). With 5% 
noise in the data (left plot) both the location and reflectivities of the scatterers are very close to 
the real ones. Even with 10% noise in the data (middle plot) we can determine the location of 
the four scatterers. However, with 30% noise we miss the forth scatterer. Finally, we show in the 
bottom row the recovered images using a = 200. For 5% and 10% noise (left and middle images, 
respectively), the location of the scatterers is exact. Furthermore, the recovered reflectivities are 
very sharp. However, we still miss the location of one scatterer when there is 30% noise in the data, 
as can be seen in the right image of the bottom row of this figure. We plan to investigate in detail 
the robustness of the algorithm with respect to noise in a future publication. 




I 




I 

■TO. 



t 

■ -0. 

■ -0. 

■ -0. 
-0. 

t 




Figure 4: Impact of the regularization parameter r = a ||^|^b(a;)||/^ on the reconstructions for 
different amounts of noise in the data. Top row: Recovered images with a = 2 and 5% noise (left), 
10% noise (middle) and 30% noise (right). Middle row: same as top row but for a = 20. Bottom 
row: same as top row but for a = 200. 
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4 Proof of Theorems [231 ETS] and EZl 

Theorems 12.41 and 12.31 are easy consequences of Theorem 12.71 and its proof. 

4.1 Outline of the proof of Theorem 12.71 

Let X be the unique minimizer of (II. ip We write x^ = x + q^ and obtain 

^ = -tG, {x + q,) + A* {ze - Aq,) , ^ = -Aq, . (4.1) 

Our goal is now to show that qe{t) — t- as t — )■ +00. If we take the time-derivative of the first 
equation in (14. ip . and use the second equation, we obtain: 

iie + A*A{qe + qe) = -Tg' (x + qe) qe. (4.2) 

Here g^{x) is a diagonal matrix with the entries on the main diagonal given by 

9iiix) = { ,, ., I I (4-3) 

I l/e, it \xi\ < £. 

Note that ()4.2p is simply an equation for an oscillator with friction, and a forcing term in the right 
side. As the matrix A* A is singular, the oscillator is degenerate. Therefore, it is reasonable to expect 
that the friction term A* Aqir in (|4.2p by itself would ensure that ^ge(i) — )• as t — )• +00, provided 
that the forcing does not interfere. However, the friction alone can not send q'e(i) to zero since it is 
degenerate. Moreover, in showing that qe{t) becomes small as t — >• +cxd, one has to use the fact that 
x is the minimizer of (jl.ip and not just any solution of Ax = y. The strategy of the proof is (i) to 
establish uniform bounds on qdt) and Ze(t), and (ii) show that any limit point of qe{t) as i — )• +00 
is close to zero. 

The a priori bounds are obtained in several steps. We first describe the required intermediate 
lemmas, and present their proofs later. The first step in the proof is the following lemma that 
provides a Lyapunov function for (j4.ip and establishes a bound on {{Aq^ 



Lemma 4.1 There exists a constant Cq > that is independent of e (and depends only on the initial 
data xq) so that 



^ + \\Aq,{t)f+ / \\Aq,{s)fds<Co, (4.4) 

Jo 

for all e < Eq andf all t > 0. 

The bound on ||Aq£|| in Lemma l4. 1 1 leads to a uniform bound on -Ze(t). 

Lemma 4.2 There exists a constant C > that is independent of e > so that \\zi;(t)\\ < C for all 
t> 0. 

The next step is to show that ^(?e(t) is small for large times. Since i^ = Aq^, it follows from 
Lemma 14.21 that 

/ Aqs{s)ds 
Jti 



is uniformly bounded for all ii^2 > 0. Together with the integral bound on ^(^e(t) in Lemma l4.1 
this shows that Aqi;{t) becomes small at some "not too large" time. 
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Lemma 4.3 There exists two constants Ci^2 > that are independent of e £ (0, Eq) so that for any 
k £ N there exists a time t^ < Cik"^ such that for all t £ {tk,tk + C2^) we have \\Aqs(t)\\ < Ci/k for 
all £ < Eq. 

Next, using the bounds in Lemmas 14.11 and 14. 2^ as well as the precise form of the forcing term in 
?2|). we obtain a uniform bound for \\q^ 



Lemma 4.4 There exists a constant C > so that we have 

\\x + qs{t)\\<C, (4.5) 

for all t > and all e > 0. 

The bound on ||(?e(t)|| ahows us to strengthen Lemma [4.31 to include a bound on (?e(i) "at some 
times" as well. 

Lemma 4.5 There exists a constant C > that is independent of £ £ (0,eo) so that for any k £N 
there exists a time Sk < Ck^ such that \\Aq^{sk)\\'^ + ||(7e(sfc)|p < C/k for all e < Eq. 

The Lyapunov function in Lemma |4. II and Lemma 14.51 together imply that qe{t) and ie(i) are not 
only "small sometimes" but rather tend to zero as t — )• +oo 

Corollary 4.6 There exists a constant C > that is independent of e £ (0, eo) so that for any 
re E N there exists a time Sn = Sn{£) < Cn^ such that \\Aqi,{s)\\'^ + ||g£(s)|p < C/n for all e < Eq and 
all s > Sn- 

Corollary 14.61 shows that the right side of the ODE system ()4.ip is small as t — )• +oo. The final step 
in the proof is to show that this implies that qe{t) is small, and it is here that the condition that x 
is the minimizer of (jl.ip comes into play. 

4.2 The end of the proof of Theorem 12.71 

It follows from Corollary 14.61 that for any 5o > there exist T = T{5o), and eo = ^o('5o) 

\\A*Z,{t) - rGeix + qeitm < ^0, Pge(i)ll < '^0 (4.6) 

for all £ < £o and t > T. The first inequality in ()4.6p implies 

\{A*Ze{t) - TGeix + q,{t)) ' {x + qe{t))\ < 6o\\x + g,(t)||. 
Using the second inequality from (j4.6p in 

\A*Z,{t) ■ {x + qe{t)) - A*Ze{t) ■ x\ < 5o\\Ze{t)\l 

and denotingll 

"Reix) = y^^XiGejXi), 

i 

we obtain 

|tH(x + qe{t)) - A*Ze{t) ■x\<6o {\\Zs{t)\\ + \\x + q,{t)\\) . 



^The quantity ^^(a;) plays essentially the same role as ||a;||;i defined in (|2.20|) . They are, however, quantitatively 
slightly different. 
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It also follows from the first inequality in ()4.6p that 

\\A*Zemi^<T + 5o, 

and thus 

\{A*z,{t)-x)\<{T + 5o)\\x\\i,. 

As a consequence, 

Hx + Qeit)) - \\x\\l, < - {\\x\\h + W^sim + ||X + qe{t)\\) . 

T 

and therefore 

\\x + qeit)\\i^ -\\x\\i, < — {\\x\\i, + \\z,{t)\\ + \\x + q,{t)\\) + eon. (4.7) 

r 

Here n is the dimension of q^. As a; is the unique minimizer, for any 5 we can choose a and 5q 
sufficiently small so that estimates 

\\x + qs{t)\\h - Mh < a> \\^Qs{t)\\ < So 

imply that ||g£|| < 5. Hence it remains to use uniform boundedness of x + qs{t) and Zi;{t) and choose 
5o and eo so that 

— {\\x\\i, + \\zeit)\\ + \\x + qeit)\\) + eon < a. 

T 

This finishes the proof of Theorem 12.71 except for the proof of Lemmas 14.1114.51 and Corollary 14.61 D 

4.3 Proof of Theorem [2741 

Fix Tg such that |(7e(t)| < S for all T > T^. We know from the Arzela-Ascoli theorem that qe{t) — s- q{t) 
and Ze — )• z{t) uniformly on [0, T^], after extracting a subsequence, and the functions q{t) and z{t) 
are Lipschitz on [0,T5], with the Lipschitz constant independent of (5 > 0. The second equation in 
(j4.ip . and the dominated convergence theorem imply that 

z{t) = - Aq{s)ds, (4.8) 

Jo 

whence 

z = -Aq, z{0) = 0. (4.9) 

The family /e(t) = Ge{x + qe{t)) is uniformly bounded in L'^[0,Ts]. Hence, after possibly ex- 
tracting a subsequence, it converges weakly in L^[0, T^] to a limit f{s). The (vector-valued) function 
f{s) satisfies the following properties: (i) —1 < fj{t) < 1, for all < i < Ts, I < j < N, and (ii) if 
qj{t) / —Xj then fj{t) = sgn{xj + qj)- It follows that for any < ti < t2 < Tg we have 

q{t2) - q{ti) = -r r f{s)ds + / ' A*{z{s) - Aq{s))ds. (4.10) 

Jti Jti 



The aforementioned properties of f{t) imply that x{t) = x + q{t) is a strong solution of (j2.9p . 
Uniqueness of the strong solution [15] implies that the whole family Xe{t) = x + qs{t), Zir{t) converges 
to the solution of (12. 9p . The conclusion of Theorem 12.41 now follows from Theorem 12.71 D 
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4.4 Proof of Theorem [2731 

Theorem 12.71 implies that as e — )• and t — )• oo, along a subsequence, we have 2^^, 
Then the first estimate in (|4.6p implies that A = z/t satisfies 

[A*A]j = sgnxj, if Xj 7^ 0. 
-1< [A*\]j < 1, ifxj = 0. 

This completes the proof of Theorem 12.31 D 



z and Qs^. — )• 0. 



(4.11) 



5 Proofs of auxiliary lemms for the proof of Theorem 12.7 

5.1 Proof of Lemma 14.11 

Multiplying (j4.2p by (jeit), gives 



ld_ 
2dt 



{Ueim' + \\Aqs{t)f) = -\\Aq, 



T{g''{x + qe)qe,qe)- 



Let 



Nl= / {g^{x + qe{s))qe{s)Ae{s))ds = y2 g'jj{xj + qej{s))\qe,j{s)\'^ds >{), 
Jo j^i Jo 

then integrating (|5.ip in time we get 

i (l^.(O)P + WAqMf) = \ {}MT)f + \\Aqe{T)f) + rN^ + £ \\Aq,fdt, 



(5.1) 

(5.2) 



(5.3) 



and (j4.4[) follows. Note that ||g£(0)|| is uniformly bounded in e > since the function Gs{s) takes 
values in the interval [—1,1]. D 



5.2 Proof of Lemma 14.21 

Differentiating the second equation in (j4.ip we obtain 

4 + AA*{Ze + Ze) = tAGs {x + qe{t)) . 
Let us multiply this equation by e* and integrate, to obtain 

/ e''ze{s)ds + e*AA*Ze{t) = TA I e^Ge{x + qe{s))ds, 
Jo Jo 

since 2;(0) = 0. We estimate, using (|4.4p : 

pt / r-t t-t \ 1/2 / I't \ i/^ 

/ e^ie{s)ds <{ e^^ds \\ze{s)fds\ < C'eM / \\Aqe{s)fds\ <Ce\ 
As \Ge,j\ < 1 for all 1 < j < n, we also have 



(5.4) 



(5.5) 



1/2 



A / e'Ge{x + qe{s))ds 
Jo 



< Ce\ 



Since the matrix AA* is invertible, we obtain from (|5.5p that ||z£(i)|| < C. D 
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5.3 Proof of Lemma 14.31 



Let us set yeit) = ^^^(t). As Zs{t) is uniformly bounded, there exists a constant C > that is 
independent of e so that 



rt2 

/ ye{s)ds < C, 
■Jti 



for all < ti < ^2- If we take an integer n = Ck, we have 



1 



n 



t+n 



ye{s)ds 



1 



(5.6) 



(5.7) 



for all i > 0, and 

]^ ft+n 



vsim < 



n 



ye{s)ds 



1 rt+n / j-s \ 1/2 1 / i-t+n \ 1/2 

(5.8) 
Lemma |4. II implies that given n there exist at most Ck'^n = Ck^ integers / such that 



1 



l+2n 

\\ye{s)fds> . 

I 4fc^n 



It follows that there exists ko < Ck^ such that 



fco+2n -j^ 



Then, for all t G (/cq, A;o + n) we have 



whence 



t+n 



\ye{s)fds< 



4A;2 



n 



for all t £ (ko ,ko + n). D 

5.4 Proof of Lemma 14.41 

Let us recall (fOj) : 

Multiply this equation by Qs and integrate: 

{qs{t),q,{t)) - {qs{0),qM) + ^P?.Wf " ^P?.(0)f + J^ \\Aqe{s)fds 



(5.9) 



(5.10) 



/ \\'ieis)fds-T {g'' {x + qe)qe,qe)ds 
Jo Jo 



(5.11) 



Next, set 



Ve{t) = - qe{s)ds, 
Jo 
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so that Zs{t) = Avsit), and ^^(O) = 0. We rewrite (|4.ip as 



dt 



-TGeix + qe)+A*A{Ve-qe), 



dVe 

dt 



Consider the function 



Q{t) = -\\A{vS) - 1e{t))\? + t\\x + qe{t)\\n 



Then we have 



-^ = T{Ge {X + qe) , qe) " (^M {v^ - q^) , qe) + {A* A {Ve - qe) , Ve 

at 



(5.12) 
(5.13) 

(5.14) 



\qef + - — \\Avef - {Av,,Aqe 



I- ii9 II ii9 II- ii9 

lyell 2dt" ^" " ^" 



As z^{0) = 0, it fohows that 

\\A{ve{t) - q,{t))f + t\\x + g,(t)|bi - \\A{vM 



q^(0))f-r\\x + qM\k 



2 " ^ \\q,{s)fds + l^ \\ie{s)fds. 



This can be re-written as 



|z,(t)-ylg,(t)||^ + T||x + Q,(t)|bi+ / UsisWds 

Jo 



+ f \\Aq,is)fds + Co. (5.15) 
Jo 



Adding (|5.1ip and (|5.15p gives: 



{qe{t),qe{t)) + -\\Aq,{t)f + \\Zeit) - Aqeit)f + t\\x + qeit)\\ii + 



/■* Hz (t)P 

■ / (/ (X + qe) Qe, qe)ds + ^^ + C'„ 

Jo ^ 



(5.16) 



with the constant Cq that only depends on the initial data. Lemmas 14.1! and 14.2! imply then 

('7.(i),4e(t))+r||x + g(t)|bi =-r [ {g^x + qe)qe,qe)ds+r{t), (5.17) 

Jo 

with a uniformly bounded function r(t): 1^(^)1 < C. We claim that there exists C > that is 
independent of e and t so that 



< C. 



(5.18) 



/ {9Hx + qe)qe,qe)ds 
Jo 

Indeed, let us fix some 1 < j < n and look at 

^= / ff|j(x + ge(s))g£j(s)gej(s)ds = -^ / qej{s)q^j{s)ds = —^i\\qe,j{4)f -\\qej{sk)f). 
Jo ^ k=l-^'k ^^ k=l 

(5.19) 
Here {sk, s'^), /c = 1, . . . , Q, are the time intervals that qj{s) spends in the interval {—Xj — e, —Xj +e), 
and qj{sk) = Xj ±e, depending on whether qj enters this interval from above or below, and similarly 
for qe{s'f^)- It is easy to see that qj{s'f^) = qj{sk+i), whence (|5.19p is a telescoping sum, giving 



2e 
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As both terms in the right side above can take only the values —Xj it e, we conclude that |/| < C, 
so that ([ET8]) holds. Now, (jETT]) becomes 

{qe{t),qe{t))+T\\x + q,{t)y^ <C. (5.20) 

As ||a;|| < ||x||;i, using the traingle inequality, we obtain the following inequality for m^{t) = \\q£{t)\\: 

me{t)me(t) + Cim,{t) < Ca- 

Now, the comparison principle implies that ms{t) < C for all t > 0, and the proof of Lemma 14.41 is 
complete. D 

5.5 Proof of Lemma 14.51 

Let us choose tk and t^' as in the proof of Lemma 14.31 The estimate for [[^(^^(tfc)!! is exactly as in 
that Lemma. Next, dividing ()5.15p by C2A; = t'^. — t^ we get, due to the boundedness of Z£{t) and 

qe{t): 

r-t' 



j^ " \\qs{s)\\'ds <J + jI'' Uqe{s)fds <J + ^- (5-21) 



t'k - tk Jtk k k Jt^ 

It follows that there exists a time Sk G {tk-,t'k) such that ||Q'£(sfc)|| < C/yk. D 

5.6 Proof of Corollary 14.61 

This follows immediately from Lemma 14.51 and (|5.ip . as the latter implies that 

|2 , II A„ u\\\'2 ^ W;. t„ M|2 , || A„ I „ M|2 - ^ 



iSW + WMeiiW < UeiSuW + WAqsiSnW < ", (5.22) 

n 



for all t > Sn- O 

6 The proof of Theorem 12.21 

We will use Theorem 12.31 in order to prove Theorem 12.21 The role of the vector z that satisfies the 
sub-differential condition can be seen from the following lemma. 

Lemma 6.1 Suppose the sub- differential condition does not hold for a particular z. Then for this 
z we have a strict inequality 

h(z) = m.mF(x, z) < t\\x\\i,. (6-1) 

X 

Proof. Assume that z does not satisfy the sub-differential condition, that is, either 
(i) |[A*z]J > T for some i, or 
(ii) |[A*z]J < T, but [^*^;]j 7^ Tsign(xj) for some i such that Xi / 0. 

We will show that 

F{x + q,z) < F(x,z)=t\\x\\i^, (6.2) 

for some (sufficiently small) q, which implies (j6.ip . We will now construct q explicitly. 
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Assume first that (i) holds: | [^*2:]j | > r for some i. Then, set r = [A*z]^, and choose q so that 

e sign (r) , if /c = i, 



We have 



Qk = 

I 0, otherwise. 



-*- " • '|2 /A*^ „\ _ll;^ll I _^l;^ I „■ „ /-„\ I I;;:; l\ i ^ 1 1 /I „ 1 1 2 



F{x + q,z) =T\\x + q\\i^ + -||^g|| - {A*z,q) = t\\x\\i-^ + T{\xi + esign{r) \ - \xi\) + -\\Aq\\ - e\r\ 

< t\\x\\i^ +er - e\r\ + -||Ag|p < er||x||ii + er - e\r\ + Ce^ < r||x||jj, 

provided that we choose e sufficiently small. 

Similarly, if (ii) holds, pick some i such that Xj / but [^*2]j ^ Tsign(2;j). Assume first that 
[A*z]j = rsign(xi) with < |r| < r. Pick e £ {0,\xi\) and choose q with the components 

{— esignfxj), if k = i, , , 

0, otherwise. 

The computation is similar: 

F{x + q,z) = T\\x + q\\i-^ + -\\Aqf - {A*z,q) = t\\x\\i^ +T{\xi - esign (xj) | - \xi\) (6.4) 

1 1 

H — \\Aq\\'^ +er < t\\x\\i^ - et + er + -||Ag|p < eT||x||/^ - er + er + Ce'^ < t\\x\\i^, 

provided that e is sufficiently small. The last case case to consider is when (ii) holds, but [^*2;]j = 
— rsign(xj). We still choose q as in (j6.3p . and the computation is identical to (j6.4p . with r = — r. 
This completes the proof of Lemma 16.11 D 
Proof of Theorem 12.21 We trivially have 

h(z) = in.mF(x, z) < F(x, z) = rllxlk , 

X 

for all z. Thus, the conclusion of Theorem 12.21 would follow if we show that there exists z such that 
h{z) = t\\x\\i^. That is, we need to show that for any g 7^ and some z, we have 

F{x + q,z)= t\\x + q\\i^ + -pgf - {A*z,q) > F{x,z) = t\\x\\i,. (6.5) 



We claim that (j6.5p is true for any z that satisfies the sub-differential condition (j2.7p - recall that 
Theorem 12.31 implies that such z exists. Let z satisfy the sub-differential condition (|2.7p : 

[^*z]j = T signxj, if i £ Si, (6-6) 

\[A*z].\<T, if ie So. (6.7) 

We denoted here by Si the set of indices i such that Xi 7^ 0, and by 5*0 the set of indices i such that 
Xi = 0. 

The function F{x + q, z) is convex in q. Hence, it suffices to show that (7 = is a strict local 
minimum, that is, show that (16. 5p holds for q small enough. In particular, we may assume that 

sign {xi + qi) = sign (xj) , ifieSi. (6.8) 
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Now, we obtain from ()6.7p : 

T\qi\-[A*z]^qi>Q, ieSo, (6.9) 

while for i £ Si, we use <\6.8h and (16.61) to obtain 

T\xi + qi\ - [A*z]- qi=T (sgnxj)(xj + qi) - t (sgnxj)gi = r(sgnxi)xi = r |xj|, i G Si. (6.10) 

We deduce from (|6:9D - (l6lil that 

F{x + q,z) =T\\x + q\\i, + -\Aq\^ - {A*z,q) = ^(r|xi + gi| - [A*z]iq,)+ ^(r|gi|- [A*z]iq,) 

i€Si ieSo 

+ l\Aq\' > ^ r|x.| + ^-\Aq\' = t\\x\W + ^\Aq\'. (6.11) 

Therefore, we have F{x + q,z) > t\\x\\i-^ unless Aq = 0. However, if Aq = 0, then 

F{x + q,z) = T\\x + q\\i^ > r||x||z^, 
because x is the unique minimizer of (II. ip . Therefore, (16. Sh holds for all g. D 

7 Conclusions 

We have shown using ordinary differential equation methods that the relaxed h minimization al- 
gorithm for problems with underdetermined linear constraints converges independently of the reg- 
ularization parameter. In the examples in array imaging the observed convergence rates are faster 
than the theory implies, which means that more analysis is needed. The algorithm is robust to noise 
although we have not shown this theoretically. Finally, as the convergence rates are independent of 
dimension, generalization to the infinite-dimensional case is straightforward. 
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